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Abstract. We present a method for approximating the solution of a parameterized, nonlinear 
dynamical system using an affine combination of solutions computed at other points in the input pa- 
rameter space. The coefficients of the affine combination are computed with a nonlinear least squares 
procedure that minimizes the residual of the governing equations. The approximation properties of 
this residual minimizing scheme are comparable to existing reduced basis and POD-Galerkin model 
reduction methods, but its implementation requires only independent evaluations of the nonlinear 
forcing function. It is particularly appropriate when one wishes to approximate the states at a few 
points in time without time marching from the initial conditions. We prove some interesting charac- 
teristics of the scheme including an interpolatory property, and we present heuristics for mitigating 
^ the effects of the ill-conditioning and reducing the overall cost of the method. We apply the method 

^ to representative numerical examples from kinetics — a three state system with one parameter con- 

p-j- trolling the stiffness - and conductive heat transfer - a nonlinear parabolic PDE with a random field 

model for the thermal conductivity. 

oo 
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1. Introduction. As computational capabilities grow, engineers and decision 
makers increasingly rely on simulation to aid in design and decision-making pro- 
cesses. However, the complexity of the models has kept pace with the growth in 
computing power, which has resulted in expensive computer models with complicated 
I " I parametric dependence. For given parameter values, each costly evaluation of the 

model can require extensive time on massively parallel, high performance systems. 
04 Thus, exhaustive parameter studies exploring the relationships between input param- 

eters and model outputs become infeasible; cheaper reduced order models are needed 
for sensitivity /uncertainty analysis, design optimization, and model calibration. 

Reduced order modeling has become a very active field of research. Methods 
based on the Proper Orthogonal Decomposition (POD) have been shown to dramat- 
ically reduce the computational complexity for approximating the solution of linear 
T— I dynamical systems or parameterized linear steady state problems [U 1221 EH 13 ■ The 

success of such methods for linear models has spurred a slew of recent work on model 
reduction for nonlinear models |S1 [71 [T31 UHl HH] . In this vein, we focus on nonlinear 
^ dynamical systems that depend on a set of input parameters, but the method we 

develop can be modified for steady and/or linear parameterized models as well. The 
parameters may affect material properties, boundary conditions, forcing terms and/or 
model uncertainties; we do not consider parameterized initial conditions. Suppose we 
can afford to compute a full model solution at a few points in the parameter space, 
or equivalently, suppose we have access to a database of previously computed runs. 
How can we use those stored runs to cheaply estimate the model output at untested 
input parameter values? 

In this paper, we propose an interpolatiorj^ method that employs an affine com- 



> 

in 



*Department of Mechanical Engineering, Stanford University, Stanford, California 94305 
(paul . constant ineSst anf ord. edu). 

t Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, 
Massachusetts 02139 (qiqiamit.edu). 

^In the work of 2, 7 18 , interpolation occurs in the spatial domain; our approximation interpo- 
lates in the parameter domain. 



1 



2 



P. G. CONSTANTINE AND Q. WANG 



bination of the stored model evaluations and a nonlinear least squares procedure for 
computing the coefficients of the afRne combination such that the equation residual is 
minimized. If one is interested in approximating the state vector at a new parameter 
value at only a few points in time, then constructing and solving the least squares 
problem will be significantly cheaper than solving the true model; this justifies the 
comparison to reduced order modeling. However, if one wishes to approximate the 
full time history of a few elements of the state vector, then the least squares problem 
may be more expensive to solve than the true model. We will highlight the former 
situation in the second numerical example. 

Like standard ODE solvers, the method only requires evaluations of the forc- 
ing function from the dynamical system. Thus, implementation is straightforward 
using existing codes. However, unlike ODE solvers, the function evaluations can be 
performed independently to take advantage of parallel architectures. The model inter- 
polation scheme itself has many appealing features including point-wise optimality by 
construction, reuse of stored model evaluations, and a strategy for adaptively choosing 
points in the parameter space for additional full model evaluations. We observe that 
the error in approximation decays like the eigenvalues of a covariance-like operator of 
the process as more model evaluations added. But we show that achieving a small 
residual requires the solution of an ill-conditioned least squares problem; we present 
a heuristic for taming the potentially unwieldy condition number. 

The linear model of the data is a common feature of many model reduction 
and interpolation schemes including reduced basis methods [HI EH [HI [5], krig- 
ing surfaces/Gaussian process emulators [221 (El [Il]j and polynomial interpolation 
schemes [31[2H]- Unlike the kriging and polynomial schemes, our process utilizes the 
governing equations to construct the minimization problem that yields the coefficients, 
which invariably produces a more accurate interpolant; this is comparable to the re- 
duced basis methods and POD-based model reduction techniques [3[T]. However, in 
contrast to reduced basis methods and POD-based techniques, we formulate the least 
squares problem using only the equation residual, which requires minimal modifica- 
tions to the full system solvers. Additionally, this residual-based formulation applies 
directly to nonlinear models, which have posed a persistent challenge for schemes 
based on Galerkin projection. 

In Section [2] we pose the model problem and derive the residual minimizing 
scheme, including some specific details of the nonlinear least squares solver. We 
briefly show in Section [3| how kriging interpolation relates to the least squares pro- 
cedure. In Section [4j we prove some properties of the residual minimizing scheme, 
including a lower bound on the average error, an interpolatory property, and some 
statements concerning convergence. Section [5] presents heuristics for adding full model 
evaluations, reducing the cost of the scheme, and managing the ill-conditioning in the 
least squares problems. In Section [6j we perform two numerical studies: (i) a simple 
nonlinear dynamical system with three state variables and one parameter control- 
ling the stiffness, and (ii) a two-dimensional nonlinear parabolic PDE model of heat 
transfer with a random field model for thermal conductivity. Finally we conclude in 
Section [7] with a summary and directions for future work. 

2. Residual Minimizing Model Interpolation. Let x{t) — x{t, s) be a M^'- 
valued process that satisfies the dynamical system 



(2.1) 
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with initial condition x{0) = xq. The space S C M'' is the input parameter space, 
and we assume the process is bounded for ah s G 5 and t e [0,r]. Note that the 
parameters affect only the dynamics of the system - not the initial condition. The 
]RP-valued function / is nonlinear in the states x, and it often represents a discretized 
differential operator with forcing and boundary terms included appropriately. We 
assume that / is Lipschitz continuous and its Jacobian with respect to the states is 



nonsingular, which excludes systems with bifurcations. The process satisfying (2.11 
is unique in the following sense: Let y{t) be a differentiable M^'-valued process with 
y{0) = Xq, and define the integrated residual 4'{y) as 

Hy) = / \\y' ~ fiy,t,.s)f dt, (2.2) 



where || • |j is the standard Euclidean norm. If (j){y) — 0, then clearly y{t) — x{t) for t G 
[0,T]. This residual-based definition of uniqueness underlies the model interpolation 
method. Given a discretization of the time domain < < ■ • • < tm ^ we define 
the discretized residual (pmiy) as 

m 

m ~ <t^m{y) = Y."^' \\y'iti)-fiyiu),k,s)\\\ (2.3) 

i=l 

where wf is the integration weight associated with time t^; we write the weights as 
squared quantities to avoid the cumbersome square root signs later on. Note that the 



time discretization of (2.3) may be a subset of the time grid used to integrate x{t). 
In fact, the ti may be chosen to select a small time window within [0,T]. Again, it is 
clear that if 4>m{y) = 0, then y{ti) = x(ti). 

In what follows, we describe a method for approximating the process x{t) — x(t, s) 
using a set of solutions computed at other points in the input parameter space. The 
essential idea is to construct an afline combination of these precomputed solutions, 
where the coefficients are computed with a nonlinear least squares minimization pro- 
cedure on the discretized residual (f>miy)- It may not be immediately obvious that 
the approximation interpolates the precomputed solutions; we will justify the label 
interpolant with Theorem |4.2| 

Let Xj{t) = x(t, Sj) be the time dependent process with input parameters Sj € S 



for j = 1, . . . ,n; these represent the precomputed evaluations of (2.1) for the input 
parameters Sj, and we refer to them as the bases. To approximate the process x{t) = 
x{t,s) for a given s, we seek constants aj — aj(s) that are independent of time such 
that 

n 

x(t) w x{t) = ^ajXjit) = X{t)a, (2.4) 

where a is an n- vector whose jth element is aj, and X{t) is a time dependent matrix 
whose jth column is Xj(t). By definition, the coefficients of the affine combination 
satisfy 



3 = 1 



e^a, (2.5) 



where e is an n- vector of ones. This constraint ensures that the approximation exactly 
reproduces components of x(t) that do not depend on the parameters s. To see this. 
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let xit) be some component of the state vector x{t, s) that is independent of s. Then 



(2.6) 



For example, (2.5) guarantees that the approximation satisfies parameter indepen- 



dent boundary conditions, which arise in many applications of interest. Since the 
coefficients a are independent of time, the time derivative of the approximation can 
be computed as 



n 

= F{t)a, 

where F(t) is a time dependent matrix whose jth column is f{xj,t,Sj). Note that 
this can be modified to include a constant or time dependent mass matrix, as well. 

With an eye toward computation, define the matrices Xi — X{ti) and Fi — F{ti). 
Then the discretized residual (j)mix) becomes 



4>m{i) = - f{i{ti), s)\\' 



i=l 



i=l 

= P{a) 

To compute the coefficients a of the approximation, we solve the nonlinear least 
squares problem 



minimize p(a) 

a 

subject to e^a = \. 



(2.7) 



Note that the minimizcr of (2.71 may not be unique due to potential nonconvexity in 



/, but this should not deter us. Since we are interested in approximating x{t), any 
minimizer - or near minimizer - of (2.7) will be useful. 



One could use a standard optimization routine |20j for a nonlinear objective with 



linear equality constraints to solve (2.7 1. However, we offer some specifics of a non- 



linear least squares algorithm tuned to the details of this particular problem, namely 

(i) a single linear equality constraint representing the sum of the vector elements, 

(ii) the use of only evaluations of the forcing function to construct the data of the 
minimization problem, and (iii) the ill-conditioned nature of the problem. 

2.1. A Nonlinear Least Squares Solver. To simplify the notation, we define 
the following quantities: 



F = 



wiFi 



ip{a) = 



Wif{Xia,ti,s) 

Wmf{Xma,tm,s)^ 



(2.8) 
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The residual h 



^mp -g (;jggi^ed as 

h{a) ~ Fa — ip{a) 



so that the objective function p{a) from (2.7) can be written as 

pia) = \\h{a)r. 



(2.9) 



(2.10) 



Let J = J (a) e K™px" be the Jacobian of h{a). Given a guess G M" such that 
e^Qk = 1, the standard Newton step is computed by solving the constrained least 
squares problem 



minimize 

s 

subject to 



\JkS + h{ak)\\ 
= 0, 



(2.11) 



(2.12) 



where jT/c = J^{ak)- Let dk be the minimizer, so that the update becomes 

O/c+l = flfe + Sk- 

The constraint on 6 ensures that ak+i sums to one. 

Instead of using the standard Newton step (2.11 ), we wish to rewrite the problem 
slightly. Writing it in this alternative form suggests a method for dealing with the 
ill-conditioned nature of the problem, which we will explore in Section |5.2| We plug 



the update step (2.12) directly into the residual vector and exploit the constraint 

e^Qk+i = 1 as 



JkSk + h{ak) = Jk{ak+i 

= Jk^k+l ^ 
= Jk^k+l ^ 
= Rkttk+l 



- Qk) + h{ak) 
{h{ak) - Juak) 
{h{ak) - Jkak)e^ak+i 



where 



Rk^Jk + iKO'k) - Jkak)e^ 



(2.13) 



Written in this way, each Newton iterate ak+i can be computed by solving the con- 
strained least squares problem 



\Rka\\ 



(2.14) 



minimize 

a 

subject to a = 1. 

This is the heart of the residual minimizing model interpolation scheme. Notice that 
we used the standard Newton step in (2.12) without any sort of globalizing step 
length [5]. To include such a globalizer, we simply solve 2.14 for ak+i and compute 

5k from \2.12\ . 

We can relate the minimum residual of ( 2.14 ) to the true equation residual. Define 
''fc+i RkO-k+i, then 



||?'fe+i|| < ||Jfc||||afc+i - afcll + ^fpiok) 



(2.15) 



Near the solution, we expect jT/c to be bounded and the difference in iteration to be 
small, so that the first term is negligible. We will use this bound to relate the norm 
of the equation residual to the conditioning of the constrained least squares problem 
in Theorem 14.61 
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2.1.1. Jacobian-free Newton Step. To enable rapid implementation, we next 
show how to construct a finite difference Jacobian for the nonlinear least squares using 
only evaluations of the forcing function / from (2.1 1. We can write out J as 

J ^^,h ^ F-JfX, (2.16) 

where 



(2.17) 



Notice that the Jacobian of h with respect to a contains terms with the Jacobian of / 
with respect to the states x multiplied by the basis vectors. Thus, we need only the 
action of Jf on vectors, similar to Jacobian-free Newton-Krylov methods 14|. We 
can approximate the action of the Jacobian of / on a vector with a finite difference 
gradient; let Xj be the jth column of X, then 





WiVa;/(Xia,ti,s) 












Wrn^ xf{Xma,tm,s)_ 


X = 





Wif(Xia + £Xj{ti),ti,s) 



wif{Xia,ti,s) 



(2.18) 



We can assume that the terms f{Xia,ti,s) were computed before approximating the 
Jacobian to check the norm of the residual. Therefore, at each Newton iteration 
we need nm evaluations of / to compute the approximate Jacobian - one for each 
basis at each point in the time discretization. The implementation will use this finite 
difference approximation. But for the remainder of the analysis, we assume we have 
the true Jacobian. 

2.1.2. Initial Guess. The convergence of nonlinear least squares methods de- 
pends strongly on the initial guess. In this section, we propose an initial guess based 
on treating the forcing function / as though it was linear in the states. To justify this 
treatment, let f{x) = f{x, t, s) for given t and s. We take the Taylor expansion about 
the approximation x as 

/(x) = /(i) + V,/(i)(x-5) + ... (2.19) 
Evaluate this expansion at Xj ~ Xj{t), multiply it by aj, and sum over j to get 




= 1 



= 



The constraint eliminates the first order terms in Taylor expansion. If we ignore the 
higher order terms, then we can approximate 



f{x,t,s) « ^ajf{xj,t,s) = G{t)a, 



(2.20) 



RESIDUAL MINIMIZING MODEL INTERPOLATION 



7 



where the jth column of G{t) is f{xj,t,s). Let Gi — G(ti), and define the mp x n 
matrix G as 



G 



wiGi 



(2.21) 



Then to solve for the initial guess oq, we define 

= F - G 



(2.22) 



and solve (2.14). In words, we treat / as linear in the states and solve the same 
constrained least squares problem. Of course, if / is actually linear, then this is 
the only step in the approximation; no Newton iterations are required. In fact, we 
show in Theorem |4.3| that using this procedure alone to compute the coefficients a 
also produces an interpolant. Therefore, if the system is locally close to linear in the 
states, then a small number of Newton iterations will be sufficient. 

3. Comparison to Kriging Interpolation. In this section, we compare the 
quantities computed in the residual minimizing scheme to kriging interpolation, which 
is commonly used in geostatistics. Suppose that x{t^ s) is a scalar valued process with 
t e [0, T] and s e 5. Following kriging nomenclature, we treat [0, T] as the sample 
space and i as a random coordinate. Assume for convenience that x(i, s) has mean 
zero at each s £ S, 



E[x{s)] = [ x{t,s)dt = 0, seS. 
Jq 

The covariance function of the process is then 

cov(si,Sj) = E[x{si)x{sj)]. 



(3.1) 



(3.2) 



Next suppose we are given the fixed values Xj = x{sj), and we wish to approximate 
X = x{s) with the affine model 



(3.3) 



This is the model used in ordinary kriging interpolation [8]. To compute the coeffi- 
cients Qj, one builds the following linear system of equations from the assumed co- 
variance function (|3.2[) (which is typically a model calibrated with the {sj, Xj} pairs). 



'A e 




a 




'b 







X 




1 



Define Aij = cov(si, Sj) and bi = cov(si, s). Then the vector of coefficients a satisfies 



(3.4) 



where A is the Lagrange multiplier. 

Next we compare the kriging approach to model interpolation on a scalar valued 
process. Let Xj be the vector whose ith element is x(ii, Sj) with i = 1, . . . , m; in other 
words, Xj contains the time history of the process x with input parameters Sj. Again, 
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assume that the temporal average of x{t, s) is zero for all s € 5. The affine model to 
approximate the time history for x — x(s) is 



7 . ^i'^j 



Xa, 



T 

e a 



(3.5) 



where X is a matrix whose jth column is Xj. Notice the similarity between (3.31 



and (3.51. One way to compute the coefficients a in the spirit of an error minimizing 



scheme would be to solve the least squares problem 

minimize \\Xa ^ x\\ 

a 

subject to e^a = 1. 



(3.6) 



Ignore for the moment that solving this least squares problem requires x - the exact 
vector we are trying to approximate. The KKT system associated with the constrained 
least squares problem is 



(3.7) 



But notice that each element of X^ X is an inner product between two time histories, 
and this can be interpreted as approximating the covariance function with an empirical 
covariance, i.e. 



'X^X e 




a 




'X^x 







A 




1 



-'k -^3 



i=l 



x{ti,Sk)x{ti,Sj) « (m - 1) cov(sfc, Sj)- 



(3.8) 



Similarly for the point s, 



= '^x{ti,Sk)x{ti,s) sa (m- l)cov(sfc,s). 



(3.9) 



Then, 



m — 1 



m — 1 



-X^xKi 6, 



(3.10) 



where A and h are from (3.4 1. We can multiply the top equations of (3.71 by l/(m— 1) to 



transform it to an empirical form of (3.4). Loosely speaking, the residual minimizing 



method for computing the coefficients a is a transformed version of the least squares 



problem (3.6), where the transformation comes from the underlying model equations. 
In this sense, we can think of the residual minimizing method as a version of kriging 
where the covariance information comes from sampling the underlying dynamical 
model. 

4. Analysis. We begin this section with a summary its results. We first exam- 
ine the quality of a linear approximation of the process and derive a lower bound for 
the average error over the input parameter space; such analysis is related to the error 
bounds found in POD-bascd model reduction [llES]- We then show that the residual 
minimizing model interpolates the problem data; in fact, even the initial guess pro- 
posed in Section [2.1.2| has an interpolatory property in most cases. We then show that 
the coefficients inherit the type of input parameter dependence from the underlying 
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dynamical model. In particular, if the dynamical system forcing function depends 
continuously on the input parameters, then so do the coefficients; the interpolatory 
property and continuity of the coefficients are appealing aspects of the scheme. Next 
we prove that each added basis improves the approximation over the entire parameter 
space. Finally, the last theorem of the section relates the minimum singular value of 
the data matrix in the Newton step to the equation residual; this result implies that 
to achieve an approximation with a small residual one must solve an ill-conditioned 
least squares problem. 

4.1. An Error Bound. Before addressing the characteristics of the interpolant, 
we may ask how well a linear combination of basis vectors can approximate some 
parameterized vector. The following theorem gives a lower bound on the average 
error in the best n-term linear approximation in the Euclidean norm. This bound 
is valid for any linear approximation of a parameterized vector - not necessarily the 
time history of a parameterized dynamical system - and thus applies to any linear 
method, including those mentioned in the introduction. The lower bound is useful 
for determining a best approximation. If our approximation behaves like the lower 
bound as n increases, then we can claim that it behaves like the best approximation. 

Theorem 4.1. Letx = x{s) be a param,eterized p-vector, and let X be a full rank 
matrix of size p x n with n < p. Define the symmetric, positive semi-definite matrix 



C 



= j xx'^ds (4.1) 



and let C = U be its eigenvalue decomposition, where 9i > ■ ■ ■ > 6p are the 
ordered eigenvalues. Then 



min liXa — x\[ ds > 



S 



\ 



k=n+l 



Proof. For a fixed s with x = x{s), we examine the least squares problem 

minimize \\Xa — x\\. (4.3) 

a 

Using the normal equations, we have 

a = {X'^Xy'^X'^x. (4.4) 
Then the minimum residual is given by 

min||Xa-a;|| = \\{X {X"^ Xy^ X"^ - I)x\\ = \\Bx\\, (4.5) 

a 

where B = X{X'^ X)-'^X'^ - I. Denote the Frobenius norm by || • H^^. Taking the 
average of the minimum norm squared, we have 

Bx\\^ds= [ x'^B'^Bxds (4.6) 

5 Js 



L 



> 



Bxx'B'Wpds (4.7) 

5 

B (^J xx'^ds^ B^ (4.8) 

BCB^Wf, (4.9) 
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where (4.8) comes from Jensen's inequality. Suppose that X contains the first n 

(4.10) 



columns of U , i.e. partition 

U =[X Y] 
Similarly partition the associated eigenvalues 



e 



(4.11) 



Then since X and Y are orthogonal, 

B = X{X'^X)-^X^ - I 
= XX^ - I 



In this case 



-YY' 



\BCB'^\\f = wyy^cyy'^Wf 

= \\YQ2Y'^\\f 



\ ^' 

\ k=n+l 



Therefore, for a given X (not necessarily the eigenvectors). 



IXa-xlVds > WBCB^ 



> 



\ 



02 



(4.12) 



k=p+l 



as required. □ 

In words, Theorem |4. 1 1 states that the average optimal linear approximation error 
is at least as large as the norm of the neglected eigenvalues of covariance-like matrix 
C. As an aside, we note that ii p = n, then the best approximation error is zero, since 
X is an invertible matrix. 

The result in Theorem |4.1| is similar in spirit to the approximation properties 
of POD-Galerkin based model reduction techniques [T] and the best approximation 
results for Karhunen-Loeve type decompositions |17j . Given a matrix of snapshots 
X whose jth column is x{sj), one can approximate the matrix C ~ n^^XX'^; the 
error bounds for POD-based reduced order models are typically given in terms of the 
singular values of X. We can approximate this lower bound for a given problem and 
compare it to the error for the residual minimizing reduced model; we will see one 
numerical example that the approximation error behaves like the lower bound. 

4.2. Interpolation and Continuity. A few important properties arc immedi- 
ate from the construction of the residual minimizing model. Existence follows from 
existence of a minimizer for the nonlinear least squares problem (2.7). Also, by con- 



struction, the coefficients a provide the optimal approximation in the space spanned 
by the bases, where optimality is with respect to the surrogate error measure given by 
the objective function of (2.7) - i.e., the residual - under the constraint that e^a = 1. 
As in other residual minimizing schemes, the minimum value of the objective function 
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provides an a posteriori error raeasure for the approximation. The next few theorems 
expose additional interesting properties of the residual minimizing approximation. 

Theorem 4.2. The residual minimizing approximation x(t,s) interpolates the 
basis elements, i.e., 



x{ti,Sj) ^ Xj{ti), i = l,...,m, j = l,...,n. 



(4.13) 



Proof. Let a be a vector such that e^a — 1 and 

X.,a = x.j{ti), F^a = f{xj{ti),ti,Sj), i = l,...,m. (4-14) 
Such a vector exists; the n-vector of zeros with 1 in the jth entry satisfies these 



conditions. For p{a) = p{a,Sj) from (2.7 1, 

m 

P(a) = ^'^iW^tO'- f{Xia,ti, 3^)11"^ 
i=i 

m 

= ^i^i\\f{X]{U),h,Sj) - f{xj{t,),ti,Sj)\\'' 
i=i 

which is a minimum. Then 

x{ti,Sj) = XiO = Xj{t^), i = l,...,m. 
Since j was arbitrary, this completes the proof. □ 



(4.15) 



Theorem 4.2 justifies the label interpolant for the approximation x{t, s). For many 



cases, this property extends to the initial guess described in Section [2.1. 2[ 

Theorem 4.3. Let = i?_i(s) he defined as in (2.22). Assume the matrix 



[R_i,e] has full column rank for all s — sj. Then the initial guess, denoted by 
xo(t,s), interpolates the basis elements, i.e., 

xo{t.„Sj) ^ Xj{ti), z = l,...,TO, j = l,...,n. 



(4.16) 



Proof. The full rank assumption on [R_^,e] at each s = Sj implies that the 



constrained least squares problem 



minimize ||i?_ia|| 

a 

subject to e^a—1 



(4.17) 



has a unique solution. Let a be a vector of zeros with 1 in the jth element. Then 
e-^a — 1 and 



\R-ia\\ = \\Fa~Ga\\ 

Wlf{Xj{tl),tl,Sj) 



Wlf{Xj{ti),ti,Sj) 



'^rnfi'^j {tm ) ; ; -^j ) 

which is a minimum. By the uniqueness, 

xo{ti,s.j) = XiO = Xj{ti), i = l,...,TO. 



0, 



(4.18) 
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Since j was arbitrary, this completes the proof. □ 

The proof required the fuU rank assumption on [R'^i 



at each s = Sj. We 



expect this to be the case in practice with a properly selected bases Xj. If we extend 
this assumption to all s G 5, then we make a statement about the how the coefficients 
of the interpolant behave over the parameter space. In the next theorem, we show 
that the coefficients of the residual minimizing interpolant inherit the behavior of the 
model terms with respect to the input parameters. 

Theorem 4.4. Let i?^ = Rk{s) be defined as in (2.13), and assume the matrix 



[i?^, e]"^ has full column rank for all s G S . If the function f{x, t, s) and its Jacobian 
Va;/ depend continuously on s, then the coefficients a(s) computed with a finite number 
of Newton iterations are also continuous with respect to s. 

Proof. We can write the KKT system associated with the constrained least squares 



problem (2.141 as 



'RlRk e 




a 




0" 







A 




1 



(4.19) 



where A = A(s) is the Lagrange multiplier. The full rank assumption on [R^,i 
implies that the KKT matrix is invertible for all s G S, and we can write 



a 




RkRk 


e 


-1 


o" 


A 











1 



(4.20) 



Since the elements of Rk are continuous in s, so are the elements of the inverse of the 
KKT system, which implies that a(s) is continuous in s, as required. □ 

4.3. Convergence and Conditioning. The convergence analysis we present is 
somewhat nonstandard. As opposed to computing an a priori rate of convergence, we 
show in the next theorem that the minimum residual decreases monotonically with 
each added basis. 

Theorem 4.5. Let 



Xr, 



(4.21) 



be a given basis with n columns, and let a* be the minimizer of (2.14) with associated 
minimum function value p* = p(a* , s). Let s„+i G S and 



Xn+l 



x{ti,Sn+i) 



Define the updated basis 



(4.22) 



(4.23) 



Let a*_(_]^ be the minimizer of (2.14) with the basis X^'^~^^\ and let = p{a*^_^_i,s) 
be the minimum function value. Then there is an < a < 1 such that 



Pn+i = ap^, 



(4.24) 
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for all s Cz S . 

Proof. Define 



and note that e^a = 1. Then 



"o" , (4.25) 



P*n+i < P(a,s) = pl (4.26) 
Next let s — s„_|_i, and let a be a n + 1-vector of zeros with a 1 in the last entry. Then 

Pn+i - p(a,s„+i) = 0, (4.27) 

as required. □ 

In words, the minimum residual decreases monotonically as bases are added to 
the approximation. Since s is arbitrary, Theorem |4.5| says that each basis added to 
the approximation reduces the residual norm globally over the parameter space; in 
the worst case, it does no harm, and in the best case it achieves the true solution. 
This result is similar to Theorem 2.2 from [S]. 

In the final theorem of this section, we show that achieving a small residual 
requires the solution of an ill-conditioned constrained least squares problem for the 
Newton step. To do this, we first reshape the constrained least squares problem using 
the standard null space method (4) , and then we apply a result from |25j relating the 
minimum norm of the residual to the minimum singular value of the data matrix. 

Theorem 4.6. Let Rf^ be the matrix from the constrained least squares problem 



for the Newton step (2.14), and define (Jmin(-Rfc) to be its minimum singular value. 
Then 

{Rk) < Vn {\\J\\\\ak+i - ak II + ^/pM) ■ (4.28) 



Proof. Let R ~ Rk be defined as in (2.13). To solve the constrained least squares 



problem (2.14) via the nuUspace method, we take a QR factorization of the constraint 
vector 



/nei. 



(4.29) 



where ei is an n- vector of zeros with a one in the first entry. Partition Q — [qi, (52]) 
where qi is the first column of Q. The nx{n—l) matrix Q2 is a basis for the null space 
of the constraint. It can be shown that qi — n^^/^e. Using this transformation, the 
constrained least squares problem becomes the unconstrained least squares problem 



minimize \\RQ2v + \Rg\ 



(4.30) 



Then the Newton update is given by 



-1 = -e + Q2V* , 
n 



(4.31) 



where v* is the minimizer of (4.30). For a general overdetcrmined least squares 
problem min \\Au — b\\, Van Huffel [53] shows that for 7 > 0, 



< Wb-Au* 



(4.32) 
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where u* is the minimizer; we apply this result to (4.30). Taking 7 = 

1 



-Re, RQ2 



RQ. 



Since Q is orthogonal, (T^i^{RQ) = <J„^in{R)- Then 

1 



CTmin(^) < Vn 



RQ2V* 



-Re 



= ^/n\\Rak+i\ 



we have 



(4.33) 



(4.34) 



Combining this result with the bound on the residual (2.15) achieves the desired 
result. 



□ 



To reiterate, near the solution of the nonlinear least squares problem (2.7), we 
expect WJW to be bounded and the difference ||afc+i — ak\\ to be small so that the 
first term in the bound becomes negligible. Therefore, a solution with a small residual 
norm p(a) will require the solution of a constrained least squares problem with a small 
minimum singular value, which implies a large condition number. To combat this, 
we propose some heuristics for dealing with the ill-conditioning of the problem in the 
following section. 

5. Computational Heuristics. In this section, we propose heuristics for (1) 
reducing the cost of constructing the interpolant, (2) mitigating the ill-conditioning 
of the least squares problems, and (3) adding new bases to the approximation. 

5.1. Cost Reduction. The reader may have noticed that, if we measure the 
cost of the residual minimizing scheme in terms of number of evaluations of / from 



(2.1) (assuming we use the finite difference Jacobian described in Section 2.1.1), then 
computing the approximation can be more expensive than evaluating the true model. 
Specifically, an explicit one-step time stepping scheme evaluated at the time discretiza- 
tion ti, . . . ,tm requires m evaluations of /. However, merely constructing the matrix 



G in (2.22) at the same time discretization to compute the initial guess requires mn 
evaluations of / for n bases. And if the elements of the matrix F were not stored 
during the runs, computing them requires another mn evaluations of /. On top of 
that, each Newton step requires yet another mn evaluations. This simple cost anal- 
ysis would seem to discourage us from comparing the interpolation method to other 
methods for model reduction. 

The interpolation method becomes an appropriate method for model reduction 
when one is interested in a small subset of the time domain. For example, if a 
quantity of interest is computed as a function of the state at some final time, then 
the interpolation method can be used to approximate the state at the final time at a 
new parameter value without computing the full history. This is particularly useful 
for dynamical systems with rapidly varying initial transients that require small time 
steps. With the interpolation method, one need not resolve the initial transients with 
a small time step to approximate the state at the final time. Instead, the method 
takes advantage of regularity in the parameter space to construct the interpolating 
approximation. 

Alternatively, if one were interested in a minimal set of points in time for coarse 
approximation of the history, these could be determined with a method similar to 
the discrete empirical interpolation method [7| in the time domain. With this set 
of discretization points, one could approximate the time history at a new parameter 
value without the full model solver. We do not pursue this idea further in this work, 
but we believe it holds promise for approximating time-averaged quantities of interest. 
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This sort of reduction is applicable to the number of points m in the time domain. 
In the parameter domain, the number of points n corresponds to the number of full 
model solutions used in the affine combination (2.4 1. Typically, we think that n is 
small due to the cost of the full model solution. However, in cases where a state vector 
may be represented by an even smaller set of basis vectors, we can borrow an idea 
from moving (or windowed) least squares [27l to reduce the number of columns in the 



solves for the Newton steps (2.14). 



Choose an integer M < n. For a parameter point s, find the Af points in the 
set of {sj} that are nearest to s. Then use only the Xj corresponding to nearby 
Sj to construct the least squares approximations. In the numerical examples, we 
demonstrate the savings generated by this heuristic. 



5.2. Alleviating Ill-Conditioning. In Theorem 4.6 we showed that achieving 
a small residual required the solution of an ill-conditioned least squares problem for the 
Newton step. Here we offer a heuristic for alleviating the effects of the ill-conditioning. 
The heart of the residual minimizing scheme is the constrained linear least squares 



problem (2.141 which we rewrite with a general m x n matrix R as 

minimize ||i?a|| 

a 

subject to 



T 

e a 



1. 



(5.1) 



For now we assume that R is full rank, although it may have a very large condition 
number. The particular form of the least squares problem (the single linear constraint 
e and the zero right hand side) will permit us to use some novel approaches for dealing 
with the ill-conditioning. 

To analyze this problem, we first derive a few useful expressions. Let A be the 
Lagrange multiplier associated with the constraint. The minimizer [a^ , ~^]'^ satisfies 
the KKT conditions 



(5.2) 



We can use a Schur complement (or block elimination) method to derive expressions 
for the solution: 



'R'^R e 




a 




0" 







-X 




1 



A = 



1 



eT{RTR)-ie' 
Also note that the first KKT equation gives 

R'^Ra = Ae. 

Premultiplying this by , we get 

a^R^Ra = A a^e 



a = \(R^R)-^e. 



(5.3) 

(5.4) 
(5.5) 



or equivalently 



\Rar = A. 



(5.6) 



In other words, the value of the optimal Lagrange multiplier is the squared norm 
of the residual. We can use this fact to improve the conditioning of computing the 
approximation while maintaining a small residual. 
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Since we wish to minimize ||i?a||, it is natural to look for a linear combination of 
the right singular vectors of R associated with the smallest singular values. We first 
compute the thin singular value decomposition (SVD) of R as 



R = U^V^ diag(E) = [cri , . . . , cr„ 



and rotate (5.1) by the right singular vectors. Define 

d = V^e. 



d = V'^a, 



Then (5.1) becomes 



minimize 

a 

subject to 
with associated KKT conditions 



IIEall 



(5.7) 



(5.; 



(5.9) 



(5.10) 



Applying block elimination to solve this system involves inverting E^ and multiplying 
by d - an operation with a condition number of (f7i/cr„)^. We can improve the 
conditioning by solving a truncated version of the problem. Let fc < n be a truncation 
and partition 



"E2 


d 




a 




0" 


d^ 







-A 




1 



El 



V = [V, V2] 



(5.11) 



according to k. Since we want to minimize the residual ||Ea|| and the largest values 
of E appear at the top, we set di—i) and solve 



(5.12) 



where the superscript {t) is for truncated] the condition number of inverting and 
multiplying by E2 is ((7fe+i/cr„)^. Then the linear combination of the right singular 
vectors associated with the smallest singular values is 



"E2 d2 




r -(*) 1 




0" 


d^ 








1 



(5.13) 



We can measure what was lost in the truncation in terms of the minimum residual by 
looking at the difference between the Lagrange multipliers 



\Raf-\\Ra^*M 



= lA-A^I 



(5.14) 



And this suggests a way to choose the truncation a priori. Once the SVD of R has 
been computed, we can trivially check the minimum residual norm by computing 



the Lagrange multipliers for various truncations; see (5.6). We want to truncate to 
reduce the condition number of the problem and avoid disastrous numerical errors. 
But we do not want to truncate so much that we lose accuracy in the approximation. 
Therefore, we propose the following. For j — 1, . . . , n, compute 



dj [^n— J + l; ■ • ■ 7 dji\ 

Ej = diag(cr„_j+i , . . . , cr„) 



the last j elements of d 
the last j singular values 
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The Xj measure the norm of the residual for a truncation that retains the last j 



singular values; 
such that 



A„ is the true minimum residual. Therefore, we seek the smallest k 



(5.15) 



for a given tolerance r representing how much minimum residual we are willing to 
sacrifice for better conditioning, and this determines our truncation. 

5.2.1. A Note on Rank Deficiency. There is an interesting tension in the 



constrained least squares problem (5.1 1. The ideal scenario would be to find a vector 
in the null space of R that also satisfies the constraint. This implies that we want R to 
be rank deficient for the sake of the approximation. However, if the matrix [R'^,e]'^ 
is rank deficient, then there are infinite solutions for the constrained least squares 
problem [3]. To distinguish between these two potential types of rank deficiency, we 
apply the following. Suppose R is rank deficient and we have partitioned 



R = [Ui U2] 



"Si 














(5.16) 



Next compute d = V2^e. All components of d equal to zero correspond to singular 
vectors that are in the null space in the constraint; we want to avoid these. Suppose 



d = 



'di' 














(5.17) 



Then we can choose any linear combination of the vectors V2,i that satisfies the 
constraint to achieve a zero residual; one simple choice is to take the component-wise 
average of the vectors ¥2^1- 

If all of c? = 0, then we must apply a method for the rank deficient constrained 



least squares problem, such as the null space method to transform the (5.11 to an 
unconstrained problem coupled with the pseudoinverse to compute the minimum norm 
solution |4]. 

The drawback of this SVD-based approach is that each Newton step requires 
computing the SVD. And we expect that most parameter and uncertainty studies 
will require many such evaluations. Therefore we are actively pursuing more efficient 



methods for solving (5.11 



5.3. Adding Bases. When doing a parameter study on a complex engineering 
system, the question of where in the parameter space to compute the solution arises 
frequently. In the context of this residual minimizing model interpolation, we have 
a natural method and metric for answering this question. Let p* = p* (s) be the 



minimum objective function value from the nonlinear least squares problem (2.71 
with a given basis. Then the point s 
maximizer of 



e 5 to next evaluate the full model is the 



maximize P*{s), 



(5.18) 



so that Xn+i{t) = x{t, Sn+i). Of course, the optimization problem (5.18) is in general 
non-concave, and derivatives with respect to the parameters s are often not avail- 
able. However, we are comforted by the fact that any point in the parameter space 
with a positive value for p* will yield a basis element that will improve the approx- 
imation; see Theorem 4.5 We therefore expect derivative-free global optimization 
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heuristics [16j to perform sufficiently well for many applications. The idea of maxi- 
mizing the residual over the parameter space to compute the next full model solution 
was also proposed in [5j in the context of reduced order modeling, and they present a 
thorough treatment of the resulting optimization problem - including related greedy 
sampling techniques [11] . 

There are many possible variations on (5.18). We have written the optimization 
to chose a single basis to add. However, multiple independent optimizations could be 
run, and a subset of the computed optima could be added to accelerate convergence. 

6. Numerical Examples. We numerically study two models in this section: 
(i) a three-state nonlinear dynamical system representing a chemical kinetics mech- 
anism with a single input parameter controlling the stiffness of the system, and (ii) 
a nonlinear, parabolic PDE modeling conductive heat transfer with a temperature 
dependent random field model of the thermal conductivity. The first problem is a 
toy model used to explore and confirm properties of the approximation scheme; we 
do not attempt any reduction in this case. The second example represents a step 
toward a large scale application in need of model reduction. All numerical exper- 
iments were performed on a dual quad-core Intel W5590 with 12GB of RAM run- 
ning Ubuntu and MATLAB 2011b. Scripts for reproducing the experiments can be 
found at www . Stanford . ed u/~paulcon/rmmi . html, the second experiment requires 
the MATLAB PDE Toolbox. 

6.1. A 3-Species Kinetics Problem. The following model from [24, contains 
the relevant features of a stiff chemical kinetics mechanism. Let 



x{t, s) 



u{t, s) 
v{t,s) 
w{t, s) 



(6.1) 



be the three state variables with evolution described by the function 



s s s s 



(6.2) 



where the parameter s controls the stiffness of the system; the initial state is uq = 
i>o = "Wq = 0.5. We examine the parameter range s e [0.005, 1.2], where the smaller 
value of s corresponds to a stiffer system. Given s, we solve for the evolution of the 
states using MATLAB 's ode45 routine from t = to t = 1. The ODE solver has 
its own adaptive time stepping method. We extract a discrete time process on 300 
equally spaced points in the time interval [0,1]; denote these points in time by tj 
where 



At 



1 

300' 



(6.3) 



In Figure ]6A\ we plot the evolution of each component for ,s = 0.005 and s = 1.2. We 
can see that the stiffness of the equation yields rapidly varying initial transients for s 
near zero, which makes this a challenging problem. 

We approximate the average error and average minimum residual over the pa- 
rameter space by using 300 equally spaced points in the parameter range [0.005, 1.2]; 
denote these points by 



Sk = 0.005 -t- kAs, 



As 



1.195 
300 ■ 



(6.4) 
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(a) 



(b) 



Fig. 6.1: The evolution of the states described by (6.2 ) for (a) s = 0.005 corresponding 
to a stiff system and (b) s — 1.200 corresponding to a nonstiff system. 



The approximate average error and minimum residual are computed as 



£ = AsAt ^^||i(t„Sfe)-2;(tj,Sfc)||, 



k j 



k j 



(6.5) 
(6.6) 



We examine the reduction in £ and TZ as more bases are added according to the 
heuristic in Section |5.3[ and we compare that with the lower bound from Theorem 
4.1 We compute the lower bound from Theorem 4.1 (the right hand side of (4.2)) 
using a Gauss-Legendre quadrature rule with 9,600 points in the range of s, which 
was sufficient to obtain converged eigenvalues. 

We also use the discretization with Sk to approximate the optimization problem 
(5.18) used to select additional basis elements. In other words, for a given basis 



we compute the minimum residual at each s^, and we append to the basis the true 
solution corresponding to the Sk that yields the largest minimum residual. We begin 
with 2 bases ~ one at each end point of the parameter space - and stop after 40 have 
been added. £ and TZ appear in Figure [6?2] compared to the computed lower bound. 
We see that the error behaves roughly like the lower bound. For each evaluation of 
the reduced order model, we track the number of Newton iterations, and we plot the 
average over the 300 Sk in Figure [6?2l Notice that the number of necessary Newton 
iterations decreases as the number of basis functions increases. 

To confirm the results of Theorem |4.6[ we plot the minimum residual at each 



parameter point with the maximum condition number of the matrices Rk from (2.141 



used to compute the Newton steps. In Figure 6.3 we show this plot for 5, 10, 20, and 
40 bases. We clearly see an inverse relationship between the condition number and 
the minimum residual, which is consistent with Theorem 14.61 



6.2. Nonlinear Transient Heat Conduction Study. Next, we examine a 
two-dimensional transient heat conduction model for a steel beam with uncertain 
material properties in a high temperature environment. For a given realization of the 
thermal conductivity, the output of interest is the proportion of the domain whose 
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° Avg Rasid 
Avg Error 

o Lower Bound I" 



5 10 



25 30 35 40 45 



(a) 



Number of bases 



(b) 



Fig. 6.2: (a) Reduction in average minimum residual TZ (6.6), average error £ (6.5) 



and lower bound from (4.2) as bases are added according the heuristic in Section 5.3 



(b) Average number of necessary Newton iterations as basis elements are added. 




Fig. 6.3: Maximum condition number of Rk from ( 
compared to minimum residual for (a) 5, (b) 10, (c) 



2.14) over all Newton iterations 
20, andp)l40 bases. 
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temperature exceeds a critical threshold after 70 seconds. We therefore need the 
temperature at each point in the domain at t = 70 seconds. Given a few simulations of 
the heat transfer for chosen thermal conductivities, the goal of the model interpolation 
is to approximate the output of interest at other possible thermal conductivities. We 
are employing the interpolation at a single point in time, which is the setting where 
we expect to see computational savings. 

6.2.1. Parameterized Model of Thermal Conductivity. In [T5], Kodur and 
co-authors review the high temperature constitutive relationships for steel currently 
used in European and American standards, as well as present the results of various ex- 
perimental studies in the research literature. They discuss the challenges of modeling 
and design given the variation in the standards and experimental data - particularly 
when designing for safety in high temperature environments. We use their work to 
inform a statistical model of the thermal conductivity of steel. 

To construct a statistcal model consistent with the data and codes compiled 
in |15j , we pose a particular model form and choose its parameters to yield good visual 
agreement with the compiled data. For a given temperature T, let Y — Y{T,ijj) be a 
random variable that satisfies 

Y = y(r) + ay(T)Gy(r,w). (6.7) 

The dependence on to signifies the random component of the model; when clear from 
the context, we omit explicit dependence on lo. The function Gy(T,uj) is a standard 
Gaussian random field with zero mean and two-point correlation function 

C{T,,T2) = exp . (6.8) 

The parameter 7 controls the correlation between two temperatures Ti and T2 in the 



model. The apparent smoothness of the experimental data (see Figure 6.4 1 suggests 

, o 

a long correlation length; we choose 7 — v500 C. The temperature dependent mean 
Y{T) is given by the the log of the Eurocode 3 standard detailed in [15], 

^ r log(-0.0333r-l-54) r< 800°C 

^ ' \ log(27.30) T > 800°C ^ ' 

The temperature dependent function ay (T) is used to scale the variance of the model 
to be consistent with the experimental data. In particular, we choose 

cry(r) = 0.08 + G.004\/r. (6.10) 

To model the thermal conductivity k = k(T,uj), we take the exponential 

K = exp(y), (6.11) 

which ensures that realizations of k remain positive. We employ the truncated 
Karhunen-Loeve expansion [17| of the Gaussian process Gy to represent the stochas- 
ticity in K as a set of independent parameters: 

d 

Gy{T,io) « ^0,(T)yA;s,(w), (6.12) 

4=1 
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(a) Conductivity Data (b) Conductivity Model Realizations 



Fig. 6.4: Data and model realizations of temperature dependent thermal conductivity 
with uncertainty. 



where {(j)i,Xi) are eigenpairs of the correlation function C from (6.8), and Si — Si{uj) 
are a set of independent standard Gaussian random variables. The eigenfunctions 



from (6.12 1 are approximated on a uniform discretization of the temperature interval 



[0°C,1250°C] with 600 nodes; this discretization is sufficient to capture the correla- 
tion effects. To compute the approximate (0i,Ai), we solve the discrete eigenvalue 
problem for the symmetric, positive semidefinite correlation matrix associated with 
the discretization of the temperature interval. The exponential decay of the computed 
eigenvalues justifies a truncation of d = 11. 

The random variables Si control the realization of the stochastic model. We can 
then treat them as a set of independent input parameters for uncertainty and sensi- 
tivity analysis. To summarize, we have the following model for thermal conductivity: 



exp 



Y{T) + ay{T)iY,MT) 



(6.13) 



Note that we have been loose with the approximation step in (6.12). In the end, we 



treat the truncated approximation of Gy to be the true statistical model. In figure 



6.4 We plot fifty realizations of the statistical model alongside the log of the data 



and standards collected in [15] . 

6.2.2. Heat Conduction Model. Next we incorporate the statistical model 
for thermal conductivity into a computational simulation. The domain 2? is a cross 
section of a steel beam - shown in Figure 6.5 with labeled boundary segments Fi 
and F2. The boundary temperature is prescribed on Fi to increase rapidly up to a 
maximum value of 1100°C; this represents rapid heating due to a fire. The boundary 
segment on F2 is assigned a zero heat flux condition. 

We set up the problem with the following heat conduction model. For x = 
{xi,X2) e T> and t g [0,70], let T = T{x,t,s) be the time and space dependent 
temperature distribution that satisfies the heat conduction model 



pc 



dT 

at 



-V • (kVT), 



(6.14) 



with boundary conditions 



T^Tb{x,t), xeTi, and - kVT = 0, a; G F2 (6.15) 
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(a) Domain 



(b) Mesh 



Fig. 6.5: Spatial domain V for the heat conduction problem (6.141 and associated 
mesh. 






(a) 



(b) 



(c) 



Fig. 6.6: Temperature distribution at the three different times using the mean trend 
for K. 



where 



Ti,{x,t) = min < 1100, max 



98 

20, — t - 6000a; - 700 
o 



(6.16) 



The space and time dependent Dirichlet boundary condition represents a rapidly 
warming environment; its spatial dependence is plotted for three different times in 
Figure [6?7l Notice that the temperature dependent thermal conductivity k makes the 
model nonlinear. We set the initial condition to T = 20° C throughout the domain. 
The spatial domain V is discretized with 2985 nodes on the irregular triangular 



mesh shown in Figure 6.5 and the solution is approximated in space with standard 
piecewise linear finite elements; all spatial discretization is performed with the MAT- 
LAB PDE Toolbox. The time stepping is performed by MATLAB's odelBs time 



integrator on the spatially semidiscrete form of (6.14). For reference, the tempera- 



ture distribution at three points in time is plotted in figure |6.6| using the mean value 
K = exp(y). 

6.2.3. Model Interpolation Study. To test the residual minimizing model 
interpolation scheme, we set up the following experiment. We first draw 1000 inde- 
pendent realizations of a standard Gaussian random vector with c? = 11 independent 
components. Denote these points Sk G S with k — 1, . . . , 1000 where S is the input 
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(a) 



(b) 



Fig. 6.7: Space and time dependent Dirichlet boundary condition on the boundary 
segment Fi. 



parameter space. For each Sk, we compute the corresponding K{T,Sk), and the re- 
suhing temperature distribution Tk = T(x,t,Sk) with the Matlab solver. From each 
temperature distribution, we compute the fraction of the distribution that exceeds a 
critical threshold r = 1000°C at time t — 70 seconds. Let Qk — Q{sk) be defined by 
the numerical approximation 



Qk 



±I^I{T,>T)dx. (6.17) 



These Qk will constitute the cross-validation data set. 

To construct the interpolant, we draw 20 independent realizations of an 11- 
dimensional standard Gaussian random vector; denote these points by Sj € S. For 
each Sj with j = 1,...,20, we compute Tj = T(x,t, sj). From each time history 
of the temperature distribution we retain the distribution at the final time t = 70 
seconds; these constitute the basis elements for t = 70. We also compute the quantity 
f{Tj) — —V • {nVTj) at time t = 70 for each basis element, which is used to set up 



the nonlinear least squares problems (2.7) 



For each sj. from the cross-validation set, we build the interpolant Tj, — T{x, t = 
70, Sk) using the 20 basis elements. We then compute the fraction of the interpolated 
temperature distribution that exceeds the threshold, 

Qk ~ ^ f I{ fk>T)dx. (6.18) 
Jv 

We compute the error £k = £{sk) with respect to the cross-validation data 

£k = \Qk-Qk\. (6.19) 

Errors are averaged over the Sk, S — (1/1000) J2k^k- 

To test the windowing heuristic from Section |5.H we choose the M = 5 basis 
elements nearest the interpolation point from each basis set. We compute the 
same Qk for each Sk using the smaller basis set, and the error is computed as in 



(6.19) 



To average out some of the effects of randomly choosing an especially good or 
bad basis set with respect to the cross-validation set, we repeat this experiment 10 
times with different randomly chosen basis sets. Errors are averaged over all Sk in the 
cross-validation set and over all 10 expereiments. A histogram of the log of all errors 
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(a) Full interpolation error 



(b) Reduced interpolation error 



Fig. 6.8: Histograms of the log of the error between the interpolants and the cross- 
vahdation data. 




Fig. 6.9: Histogram of the wall clock times for the full and reduced interpolants. 



is shown in figure 6.8 for both the full basis set and the windowed basis set. We see 
that there is practically no difference in error for the smaller windowed basis set. 

In figure |6.9[ we show a histogram of the wall clock timings for the full basis 
set and the smaller basis sets. We see that the basis reduction cuts wall clock time 



by 24% with no practical change in error. In table 6.1 we display the average and 
standard deviations of the wall clock timings for computing the full model (the cross- 
validation data), the full interpolation (all 20 bases), and the reduced interpolant (5 
chosen bases) . We also show the average number of function evaluations in each case; 
the counts of function evaluations for the full model are output by the Matlab odel5s 
solver. For the full and reduced interpolants, the nonlinear least squares solver used 
a maximum of ten Newton iterations for each evaluation. 

7. Conclusions. We have presented a method for approximating the solution of 
a parameterized, nonlinear dynamical system using an affine combination of the time 
histories computed at other input parameter values. The coefficients of the affine com- 
bination are computed with a nonlinear least squares procedure that minimizes the 
residual of the governing equations. In many cases of interest, the computational cost 
is less than evaluating the full model, which suggests use for reduced order modeling. 
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Table 6.1: Timing and function evaluation counts for full model and interpolation. 





Avg. Time (s) 


Std. Time (s) 


Avg. # / Evals 


Full Model 


178.82 


5.59 


6166 


Full Interp. 


2.42 


0.09 


230 


Red. Interp. 


1.84 


0.08 


65 



This residual iniiiimizing scheme has similar error and convergence properties to ex- 
isting reduced basis methods and POD-Galerkin reduced order models; it stands out 
from existing methods for its ease of implementation by requiring only independent 
evaluations of the forcing function of the dynamical system. Also, since we do not 
reduce the basis with a POD type reduction, the approximation interpolates the true 
time history at the parameter values corresponding to the precomputed solutions. 

We proved some interesting properties of this scheme including continuity, con- 
vergence, and a lower bound on the error, and we also show that the value of the 
minimum residual is intimately tied to the conditioning of the least squares prob- 
lems used to compute the Newton stops. We introduced hcnuistics to combat this 
ill-conditioning and further reduce the cost of the method, which we tested with two 
numerical examples: (i) a three-state dynamical system representing kinetics with 
a single parameter controlling stiffness and (ii) a nonlinear, parabolic PDE with a 
high-dimensional random conductivity field. 
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